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Interactive Voice Communications Network Entertainment. 

FIELD OF THE INVENTION 
The present invention relates to interactive entertainment systems in general, 
5 and more particularly to interactive entertainment over voice communications networks. 

BACKGROUND OF THE INVENTION 
While the telephone today is used for point-to-point communications and voice 
transmission, telephone companies and content providers are looking for ways to use the 
10 telephone as a platform for mass media entertainment. The popularity of radio shows of the 
1930s, 40s, and 50s, and television shows then and now, ought to provide a natural model 
for telephone-based entertainment, but this has not been the case. Part of the problem lies 
in the nature of radio and television shows as historically being non-interactive media, 
whereas telephones are interactive devices by nature. 

15 

SUMMARY OF THE INVENTION 
The present invention discloses a system and methodology for the creation, 
delivery and operation of voice-based interactive and conversational entertainment over 
voice networks, such as the telephone network., 

20 In once aspect of the present invention a method is provided for operating a 

telephone entertainment program, the method including a) receiving a voice communication 
from at least one caller, b) selecting audio output in accordance with an audio entertainment 
program, c) presenting the audio output to the caller, d) prompting the caller for input at a 
plot point of the audio entertainment program, e) receiving the input from the caller, f) 

25 selecting audio output at least partly in accordance with the audio entertainment program 
and the input, and g) presenting to the caller the audio output selected in step f). 

In another aspect of the present invention the method further includes 
performing steps d) through g) a plurality of times for a plurality of plot points of the audio 
entertainment program. A method according to claim 1 where the selecting step f) includes 

30 appl5dng decision logic to the input, thereby determining a state of the audio entertainment 
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program, and selecting the audio output at least in part according to a predetermined 
association with the state. 

In another aspect of the present invention any of the receiving steps includes 
receiving audio input. 

5 In another aspect of the present invention any of the receiving steps includes 

receiving text-based input. 

In another aspect of the present invention any of the selecting and presenting 
steps includes selecting and presenting text-based input. 

In another aspect of the present invention the method further includes 
10 maintaining a history of the caller inputs, and where the selecting step f) includes selecting 
at least in part in accordance with the history. 

In another aspect of the present invention the method further includes operating 
a plurality of virtual performers, and where any of the selecting steps includes any of the 
virtual performers determining a state of the audio entertainment program and selecting at 
1 5 least part of the audio output according to a predetermined association with the state. 

In another aspect of the present invention the method further includes operating 
a game simulation engine operative to apply decision logic to the input, thereby determining 
a state of the audio entertainment program, and select the audio output at least in part 
according to a predetermined association with the state. 
20 In another aspect of the present invention the operating step includes applying 

the decision logic in accordance with a rule structure of a game. 

In another aspect of the present invention the operating step includes applying 
the decision logic in accordance with a predetermined outcome probability. 

In another aspect of the present invention the method further includes 
25 conducting the audio entertainment program for each of a plurality of callers, recording a 
history of the interaction of each of the callers with the audio entertainment program, and 
providing access to the histories to any of the callers. 

In another aspect of the present invention the method fiirther includes ranking 
the callers according to a characteristic of the caller's interaction with the audio 
30 entertainment program. 
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In another aspect of the present invention any of the steps are performed for a 
plurality of callers within the context of the audio entertainment program. 

In another aspect of the present invention a method is provided for constructing 
phrases from pre-recorded variants of speech elements, the method including a) selecting a 
5 pre-recorded variant of a first speech element from a group of pre-recorded variants of the 
first speech element, b) selecting a pre-recorded variant of a second speech element from a 
group of pre-recorded variants of the second speech element, and c) constructing a phrase 
from the selected variants. 

In another aspect of the present invention the selecting step b) includes selecting 
10 where the second speech element associatively follows the first speech element. 

In another aspect of the present invention any of the selecting steps includes 
selecting any of the variants at least in part according to a predetermined association with a 
relationship between a virtual performer and a caller. 

In another aspect of the present invention a virtual theater architecture is 
15 provided including virtual performer means operative to play the role of a specific character 
in a telephone show, stage manager means operative to interpret a flow script of the 
telephone show and send messages to the virtual performer means, each of the messages 
being a directive of the flow script, and stage means operative to maintain state information 
of the telephone show and receive behavior exhibited by the virtual performer means 
20 responsive to receipt of any of the messages. 

In another aspect of the present invention the architecture fiirther includes a set 
of behavior rules, and a behavior history, and where the virtual performer means is 
operative to determines its own behavior by applying the behavior rules to any of the state 
information, the incoming messages, and the behavior history. 
25 In another aspect of the present invention a telephone entertainment system is 

provided including a telephony interface operative to interface with a caller, speech/voice 
processing means operative to interface with the telephony interface receive input from the 
caller, presentation means operative to interface with the speech/voice processing means 
and prepare output at least partly based on the input, and a game engine operative to 
30 interface with the presentation means and operate at least one virtual performer in 
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accordance with a flow script, thereby providing an output directive to the presentation 
means for use in preparing the output. 

In another aspect of the present invention a telephone entertainment system is 
provided including a telephony interface operative to interface with a caller, speech/voice 
5 processing means operative to interface with the telephony interface and including a speech 
processor operative to perform automatic speech recognition on speech input received from 
the caller, a template module for facilitating input and output via templates, an audio 
playout module for producing audio output to the caller, presentation means operative to 
interface with the speech/voice processing means and including means for preparing flow 

10 script bubbles for output via the audio playout module, means for mmntaining call state 
information, means for populating pre-defined templates with links to audio content in 
predetermined association with the bubbles and the call state, a game engine operative to 
interface with the presentation means and including means for processing a flow script, 
means for operating software agents representing virtual performers in accordance with the 

15 flow script, and data storage means accessible to the game engine for storing and retrieving 
any of game variables, user profile information, statistics, language models and behavior 
information in association with the processing of the flow script. 

In another aspect of the present invention a method is provided for processing 
user input into an interactive telephony application architecture, the method including 

20 submitting a request to a controller, the request representing interpreted input from a user, 
the controller retrieving information from a presentation layer relevant to the input and 
popping a session context from a presentation layer stack, retrieving a Hst of post-tasks of a 
previous action and performing the post-tasks, pushing a new session context onto the 
presentation layer stack, retrieving a hst of pre-tasks, performing the pre-tasks, and 

25 rendering output via a scripted template subsequent to performing any of the tasks. 

In another aspect of the present invention a flow script for an audio 
entertainment program is provided, the flow script including a plurality of plot points, a 
plurality of transitions between the plot points, a plurality of rules for determining 
movement between the plot points and the transitions, and a plurality of output directives 

30 associated with any of the plot points and the transitions. 
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In another aspect of the present invention the flow script further includes a 
plurality of messages for delegating to a plurality of virtual performers responsibility for 
determining actual output based on any of the output directives. 

In another aspect of the present invention the flow script further includes a 
5 plurality of rules for determining movement between the plot points and the transitions 
based on caller input. 

In another aspect of the present invention the flow script further includes a 
grammar for interpreting the caller input. 

In another aspect of the present invention a telephone entertainment system is 
10 provided including a) means for receiving a voice communication from at least one caller, b) 
means for selecting audio output in accordance with an audio entertainment program, c) 
means for presenting the audio output to the caller, d) means for prompting the caller for 
input at a plot point of the audio entertainment program, e) means for receiving the input 
from the caller, f) means for selecting audio output at least partly in accordance with the 
15 audio entertainment program and the input, and g) means for presenting to the caller the 
audio output selected in step f). 

In another aspect of the present invention the means for selecting f) includes 
means for applying decision logic to the input, thereby determining a state of the audio 
entertainment program, and means for selecting the audio output at least in part according 
20 to a predetermined association with the state. 

In another aspect of the present invention any of the means for receiving are 
operative to receive audio input. 

In another aspect of the present invention any of the means for receiving are 
operative to receive text-based input. 
25 In another aspect of the present invention any of the means for selecting and 

presenting are operative to select and present text-based input. 

In another aspect of the present invention the system fiirther includes means for 
maintaining a history of the caller inputs, and where the means for selecting f) is operative 
to select at least in part in accordance with the history. 
30 In another aspect of the present invention the system further includes a plurality 

of virtual performers operative to determine a state of the audio entertainment program and 
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select at least part of the audio output according to a predetermined association with the 
state. 

In another aspect of the present invention the system further includes a game 
simulation engine operative to apply decision logic to the input, thereby determining a state 
of the audio entertainment program, and select the audio output at least in part according to 
a predetermined association with the state. 

In another aspect of the present invention the game engine is operative to apply 
the decision logic in accordance wth a rule structure of a game. 

In another aspect of the present invention the game engine is operative to apply 
the decision logic in accordance with a predetermined outcome probability. 

In another aspect of the present invention the system further includes means for 
conducting the audio entertainment program for each of a plurality of callers, means for 
recording a history of the interaction of each of the callers with the audio entertainment 
program, and means for providing access to the histories to any of the callers. 

In another aspect of the present invention the system further includes means for 
ranking the callers according to a characteristic of the caller's interaction with the audio 
entertainment program. 

In another aspect of the present invention any of the means are operative for a 
plurality of callers within the context of the audio entertainment program. 

In another aspect of the present invention a phrase construction architecture is 
provided including a first group of pre-recorded variants of speech elements, and a second 
group of pre-recorded variants of speech elements, where the second group associatively 
follows the first group. 

In another aspect of the present invention a virtual theater method is provided 
including operating at least one virtual performer operative to play the role of a specific 
character in a telephone show, interpreting a flow script of the telephone show and send 
messages to the virtual performers, each of the messages being a directive of the flow script, 
and maintaining state information of the telephone show responsive to behavior exhibited by 
the virtual performers responsive to receipt of any of the messages. 
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In another aspect of the present invention the operating step includes applying 
behavior rules of the virtual performer to any of the state information, the incoming 
messages, and a behavior history for the virtual performer. 

It is appreciated throughout the specification and claims that references to 
5 telephones, telephone shows, telephone programs, and telephone networks may be 
understood within the context of any system capable of conveying audio media, such as, for 
example, voice-over-IP (VoIP) systems, packet-based telephony systems such those 
specified by GPRS, 3G and UMTS and are not limited to existing telephone-based systems. 



1 0 BRIEF DESCRIPTION OF THE DRAWINGS 

The present invention will be understood and appreciated more fiiUy fi-om the 
following detailed description taken in conjunction with the appended drawings in which: 

Fig. 1 is a simpUfied flowchart illustration of a method for operating a telephone 
entertainment program, operative in accordance with a preferred embodiment of the present 
1 5 invention; 

Fig, 2 is a simplified illustration of a flow script structure, constructed and 
operative in accordance with a preferred embodiment of the present invention; 

Fig. 3 is a simplified flowchart illustration a method of implementing flexible 
speech, operative in accordance with a preferred embodiment of the present invention; 
20 Fig. 4 is a simplified pictorial illustration of a flexible speech association 

structure, constructed and operative in accordance with a preferred embodiment of the 
present invention; 

Fig. 5 is a simplified conceptual illustration of a virtual theater architecture, 
constructed and operative in accordance with a preferred embodiment of the present 
25 invention; 

Fig. 6 is a simplified block diagram illustration of a telephone entertainment 
system, constructed and operative in accordance with a preferred embodiment of the 
present invention; 

Fig. 7 is a simplified block diagram illustration of selected elements of the 
30 telephone entertainment system of Fig. 6, constructed and operative in accordance with a 
preferred embodiment of the present invention; 
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Fig. 8 is a simplified UML collaboration diagram of elements of the presentation 
layer described in the system of Figs. 6 and 7, operative in accordance with a preferred 
embodiment of the present invention; 

Fig. 9 is a simplified UML sequence diagram of a method of operation of the 
5 system of Figs. 6 and 7, operative in accordance with a preferred embodiment of the present 
invention; 

Fig. 10 is a simplified UML activity diagram of a method of operation of the 
system of Figs. 6 and 7, operative in accordance with a preferred embodiment of the present 
invention; 

10 Fig. 11 is a simplified block diagram illustration of a method of multi-player 

operation of the system of Figs. 6 and 7, operative in accordance with a preferred 
embodiment of the present invention; 

Fig. 12 is a simplified UML activity diagram of a method of multi-player 
operation of the system of Figs. 6 and 7, operative in accordance with a preferred 
1 5 embodiment of the present invention; 

Fig. 13 is a simplified UML collaboration diagram of a game engine, 
constructed and operative in accordance with a preferred embodiment of the present 
invention; 

Fig. 14 is a simplified pictorial illustration of aspects of virtual performer 
20 implementation, operative in accordance with a preferred embodiment of the present 
invention; 

Fig. 15 is a simplified pictorial illustration of aspects of virtual performer 
implementation, operative in accordance with a preferred embodiment of the present 
invention; 

25 Fig. 16 is a simplified pictorial illustration of aspects of virtual theater 

implementation, operative in accordance with a preferred embodiment of the present 
invention; and 

Fig. 17 is a simplified flowchart illustration of league-based telephonic 
entertainment, operative in accordance with a preferred embodiment of the present 
30 invention. 
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DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS 
Reference is now made to Fig. 1, which is a simplified flowchart illustration of a 
method for operating a telephone entertainment program, operative in accordance with a 
preferred embodiment of the present invention. In the method of Fig, 1 a caller accesses a 
5 telephone entertainment system using a voice communications network, such as the 
telephone network. A preferred implementation of a telephone entertainment system is 
described in greater detail hereinbelow with reference to Figs. 6 and 7. The system then 
answers the call through a voice network interface. The caller's identity is then ascertained, 
such as by using Caller ID and/or having the caller enter a PIN number. The system then 

10 retrieves the caller's profile, where a profile has been previously established for the caller, 
and a list of audio entertainment programs, herein referred to as telephone shows, that 
would be appropriate for the caller based on the caller's profile. The system then presents 
the user with an audio message, typically including a personalized greeting, with a 
suggestion of possible shows that the caller may choose fi"om. The user then responds, 

1 5 typically by voice, requesting one of the options. 

The system typically maintains pre-recorded audio segments, such as sound 
effects, music, audience responses, speech, etc., pre-defined speech recognition grammar 
definitions, and script segments, in addition to maintaining the caller's profile and history of 
system usage. When a telephone show is selected, the system then composes the telephone 

20 show's audio content using the show's flow script, associated audio segments and 
grammars, and the user profile and history. A preferred implementation of a flow script is 
described in greater detail hereinbelow with reference to Fig. 2. 

The system then plays the audio content, and listens to the caller's spoken 
words or other input, using the responses to affect the progress and development of the 

25 content of the show. The flow of the script is navigated based on the sequence of caller 
responses. The system may create dynamic plots built by piecing together script segments 
based on decisions made at decision or branching points within the script,. The result is a 
unique show that is the product of the interaction between the script segments and the 
caller's input. 

30 Several callers can participate together in a single telephone show, where each 

caller assumes a different role. The dynamic plot of a such a show is thus determined by the 
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combined decisions of the callers. The callers can be aware of each other's identity, such as 
where a caller invites a friend to participate in the show. A preferred method of managing 
multi-user shows is described in greater detail hereinbelow with reference to Figs. 1 1 and 
12. 

5 The telephone entertainment system may employ non-audio input and output 

media for use within the context of a telephone show, such as the text-based input of SMS 
or MMS messaging, which may be used to send show updates, information, and invitations 
to callers. Callers may also send such messages to the system. 

The system preferably records the history of a caller's participation. This 

10 information enables the system to respond in the future based on the caller's previous 
behavior. The history may also serve as the basis for market research, targeted advertising, 
and fixture telephone shows. 

One exemplary operational scenario of the method of Fig. 1 may be illustrated 
within the context of a humorous quiz show, in which characters, such as the characters of 

15 the television show The Simpsons®, interact within a game show environment having comic 
elements. In this scenario the caller plays the role of a contestant interacting with any of six 
characters on the show. Four are the Simpson family members, a fifth is the announcer, and 
the sixth is the studio audience. Six "virtual performers" play these six roles in what is 
referred to herein as a "virtual theater." A preferred method of implementing a virtual 

20 theater is described in greater detail hereinbelow with reference to Fig, 5. 

The show opens with a theme song and an announcer who then introduces the 
show and its host, Bart Simpson. Bart then welcomes the contestant (the Caller). Bart 
introduces his panel of expert questioners: Marge (his mom). Homer (his dad), and Lisa (his 
sister). Bart then asks the first questioner to ask the first question. Bart is pre-programmed 

25 with a certain probability to offer a quip about the questioner, while one or more of the 
questioners are pre-programmed with a certain probability to respond to Bart's quip. Other 
pre-programmed possibilities may, for example, enable the questioners and Bart to get into 
funny arguments. The questioners can react with good humor or they can become insulted 
and defensive in keeping v^th their pre-programmed personality traits. The audience "hears" 

30 the banter and reacts appropriately, such as with catcalls, cheering, laughter, etc. Bart calls 
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the group to order, gets the show back on track, and the questioner asks a question. The 
caller responds to the questions with verbal answers. 

Another exemplary operational scenario of the method of Fig. 1 may be 
illustrated within the context of a baseball game simulation. In this scenario it's the World 
Series during a tie-breaking game, with the caller's team leading 1-0, It's the bottom of the 
ninth inning, two outs, bases loaded, and the caller is the pitcher. If the caller strikes the 
batter out, the caller's team wins. The caller hears the screams of the crowd, while a 
sportscaster describes the scene. 

The telephone entertainment system maintains the statistics of the teams and 
players, as well as the rules of the game. The system consults its database, makes various 
game-play decisions, and creates game events. The batter typically represents a real-world 
baseball personality that is implemented as a virtual performer. The batter typically has a 
high degree of behavioral autonomy, limited only by the need to remain consistent with the 
stored facts about the real-world batter that he represents. Other virtual performers play the 
parts of the sportscaster and the crowd, pre-programmed to observe and react to the game 
as they "see" it. The caller wins the game by learning the strategy of the opposing batters as 
play progresses, and by choosing the correct pitches to strike opposing batters out. The 
caller uses voice commands to select the pitch type (e.g., fast ball, curve ball, change-up, 
slider) and location (e.g., high, middle, low, inside, center, outside), with the system 
carrying out the pitch and applying appropriate game rules and probabilities. 

The telephone show flow may be illustrated as follows. 

1 . The caller picks the team he'll pitch against. 

2. The system then selects the lineup of batters. 

3 . The system then: 

a. Decides who's next up to bat, 

b. Sends the batter to the plate, 

4. The Sportscaster describes the scene: 

a. Announces the batter's name 

b. Describes his approach to the plate. 

c. Describes the mood of the crowd. 

d. Emphasizes the tension of the moment. 
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e. Gives background "color" conunentary about the batter. 

5. The crowd reacts to the scene. 

6. The caller calls for a specific pitch, possibly including location or other control 
elements. 

7. The system decides: 

a. How accurate the pitch is. 

b. Whether the pitch is in the strike zone or not. 

c. If not, whether the pitch is inside, outside, high, or low. 

d. What this particular batter is likely to do based on the stored real-life 
statistics for this particular batter. 

e. Based on 7a and 7b the system creates the appropriate game event 
representing the outcome of the pitch. 

8. The Sport scaster reports on the event, which the system has created, using 
information stored in the statistics to add color commentary. 

9. The crowd reacts to the event. 

If another batter is needed then control returns to Step 3. 

Another exemplary operational scenario of the method of Fig. 1 may be 
illustrated within the context of a soccer game simulation. In this scenario it's the final 
minutes of a crucial cup game between two rival teams, Manchester United and Arsenal, 
and the score is tied at 0 - 0. The caller must pass and maneuver the ball across the field to 
a scoring position, shoot, and score in order to win. The callers wins the game by learning 
the strategy of the opponent as play progresses, and by choosing the correct moves to avoid 
being blocked and to score. 

The system maintains a description of the scoring probabilities at the key points 
on the field. The caller controls his team both on offense and on defense. On offense, the 
caller can choose to move the ball right, left, ahead, or back, depending on the prompted 
choices for the current position on the field, or shoot. The opponent team is played by the 
system as a Non-Player Character (NPC), and implements a strategy of anticipating the 
caller's conmiands. If the NPC correctly anticipates the caller's command, the play is 
blocked and the participant can lose possession of the ball. When on defense, the caller 
chooses the anticipated move of the ball by the offensive NPC. If the caller correctly 
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anticipates the NPC's choice, the play is blocked and the participant can gain possession of 
the ball. The caller selects his move using voice commands. 

Game time is kept by the system. The caller v^ns the game by leading at the end 
of game time, as in an actual soccer game. If there is a tie, the participant can choose to 
continue to play in sudden death overtime. 

The telephone show flow may be illustrated as follows. 

1 . The team' s coach greets the caller. 

2. The coach explains that the game against the opponent team is in the last few 
minutes of play and that the caller's help is needed to win the game, 

3 . An NPC will play the part of the opponent. 

4. The Sportscaster begins the play-by-play coverage of the game in progress, 

5. After a few exciting moments of play, the action freezes at a decision point. This is a 
point where a player needs to make a critical decision, such as which way to pass the 
ball 

6. If, at this decision point, the caller's team has the ball, the caller vsdll step into the 
part of the player with the ball and choose which way to pass. The caller makes his 
choice verbally, such as by saying "left" or "ahead" if the choices offered were to 
pass left or to pass ahead. 

7. If, at this decision point, the opponent team is in possession, the caller attempts to 
outthink the opponent and block him by predicting which movement or action the 
opponent will choose. 

8. The caller then hears of the success or failure of his action as described by the 
sportscaster. 

9. At any decision point the player with the ball can, in addition to passing the ball, 
choose to shoot. Success is based on the system's analysis of the shooter's identity 
and distance from the net, 

10. When a goal is made, the sportscaster announces the score. 

11. Actual play time is recorded by the system. When the clock runs out, the referee 
blows his whistle and the sportscaster announces the end of the game. 

The soccer game scenario may be fijrther adapted to allow two callers to play 
the game at the same time, with each caller controlling a different team. As the system 
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plays a script segment, both players hear that segment. When a decision point is reached, 
each caller is prompted separately for his/her decision what to do next. Preferably, neither 
caller hears the voice of the other caller. The system receives the two responses and selects 
the next script segment, playing the appropriate audio for the determined outcome event, 
5 which is heard by both players. Where the callers speak in different languages, they may 
respond and hear audio segments in their respective tongues. The game may be part of a 
simulated soccer league in which different callers would represent different teams which are 
determined by a virtual or real league. The callers may then be ranked in league standings 
based on their performance in the games, 

10 Another exemplary operational scenario of the method of Fig. 1 may be 

illustrated within the context of a virtual world simulation. In this scenario the caller is on a 
river in a virtual world of intertwining rivers. A virtual performer guide is provided who is 
pre-programmed with knowledge about the kinds of flora, fauna, and local tribesman that 
the caller is likely to encounter, but the guide is not familiar with the maze of intertwining 

1 5 rivers. Each time the caller explores a new bend in the river, by providing a verbal command 
in response to a prompt to provide a direction in which to move along the river, the guide 
describes what he "sees". The guide is also capable of remembering where the caller has 
been in order to return to previously visited points. The stretches of river between decision 
points requiring caller input may be used to play distinctive sounds and audio background, 

20 which may differ for different points along the river. With repeated visits to the virtual 
world simulation, the caller can learn to navigate the maze of rivers. This experience may 
also be provided for several callers simultaneously, providing the callers the possibility of 
collaborating and cooperating. 

Another exemplary operational scenario of the method of Fig. 1 may be 

25 illustrated within the context of a virtual world simulation based on the Harry Potter™ 
novels. In this scenario the caller assumes the role of a student at Hogwarts, the school for 
wizards in a magic castle. The caller then is set out on a quest, and meets characters and 
creatures familiar to them from the Harry Potter™ story. The caller moves through a series 
of interconnected paths that lead out from Hogweirts to four areas. A different type of 

30 magical creature, such as Pixies, Banshees, Gmomes and Trolls, populates each area. The 
caller's task is to obtain the trophy found at the end of one of the paths, and return it to 



wo 03/019917 



PCT/IL02/00712 



15 

Hogwarts. The caller navigates by giving voice commands when prompted, indicating a 
desired direction of movement. 

Reference is now made to Fig. 2, which is a simplified illustration of a flow 
script structure, operative in accordance with a preferred embodiment of the present 
invention, A flow script is used to describe the flow of the telephone show and descript 
what kinds of things may happen during the show. The flow script structure of Fig. 2 is 
designed to include multiple script segments that may be selectably carried out in various 
combinations. The flow script specifies interactive decision points called "plot points" that 
may branch into for multiple script segments. A flow script may be used to implement a 
plot point model including key plot points of dramatic and interactive tension, and the 
transitions between them. The plot points present opportunities for the caller* s interaction, 
and enable the caller to determine the flow of the show. 

The flow script structure of Fig. 2 employs a tag-based syntax that is 
comparable in style to the tag-based syntax used in the Extended Markup Language (XML). 
An example of this flow script structure is a syntax referred to herein as Game Definition 
Makeup Language (GDML) The elements of the flow script structure of Fig. 2 include: 

• Bubble: A bubble represents something that is said by a character in the show. 
Examples of bubbles are: "Sportscaster: TSfimni passes to the right!' or "Coach: 
•Now you may choose to pass left: or right' The specification of a bubble may 
reference a specific bubble file. 

A Bubble Class is a set of bubbles describing the same semantic idea, but with 
various alternatives, such as by using "flexible speech," described in greater detail 
hereinbelow with reference to Figs. 3 and 4. When a bubble class is specified for 
presentation, one of the bubbles belonging to the bubble class will be presented. An 
example of two bubbles which can belong to the same bubble class: 'TMaccabi indeed 
came ready for this game" and "What a performance by Maccabi". 
A Bubble Class is specified similarly to a Bubble using the Bubble tag, but 
references a folder or directory that contains a set of Bubbles. 

• Bubble String: A sequence of bubbles that forms a segment of the script. The 
bubbles in a bubble string are played one afl:er the other with no pauses in between. 
Some or all of the bubbles in a bubble string can be defined as bubble classes. 
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• Context: The context specifies the dynamic context of the Bubble or BubbleClass. 
One example would be in Sports-based telephone shows, where the same transition 
is described for different teams. By specifying the dynamic context to be the current 
Oflfensive team, the Bubbles and BubbleClasses may be selected dynamically based 

5 on the current teams playing. This enables the GDML specification to be generalized 

for a particular format, genre, or show, without requiring specific content to be 
specified. For example, the description of a transition could be specified based on 
the context of the current attacking team on Offense, and the resulting Defensive 
maneuver that leads up to the PlotPoint would be specified based on the current 
10 team on Defense, Bubbles and BubbleStrings need to be provided for each context. 

For example, for each team in the Soccer league. Bubbles and BubbleClass would be 
provided for every specification where a context is specified that is team specific. 

• Role: The caller may have a dynamic or static role in the show. An example of a 
dynamic role would be in a Sports-based telephone show where a caller's team may 

15 move between offensive and defensive rolls. Based on the role of the user in the 

show, different BubbleStrings and Grammars will be appropriate. The Role tag 
provides the role assignment for BubbleStrings and Grammars. BubbleStrings that 
describe a PlotPoint decision and a Grammar defining the caller's choices would be 
different if the caller's role is currently Offense or Defense. The role may also be 

20 both, with a BubbleString being specified for all caller roles. 

• Transition: A Transition specifies the segment of the game which leads from one 
plot point to another. The body of the Transition definition is comprised of 
BubbleStrings. 

• PlotPoint: A PlotPoint specifies the point of user interaction, i.e. a decision point. 
25 The specification of the Plot Point includes the specification of the prompts and the 

transition rule. 

• Prompts: The prompts specify the user interaction at the plot points. This 
specification includes the Grammars which determine the interpretation of user input 
and the BubbleStrings which the caller hears. 

30 • Grammars: The grammars specify the natural language interpretation of the caller 

response by referencing a grammar specification. Examples of such grammars 
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include a Speech Recognition Granunar Specification or an n-Gram Language 
Specification as defined by the W3C. The Grammar tag also specifies how to 
prompt the caller where there is no recognized input. 
• Rules: The rules specify what action should be taken based on the user decision. 
5 This is specified by defining the decision input of all the roles at the PlotPoint, such 

as the decisions by offense and defense in a Soccer game, and the resulting 
transition. Rules can also specify domain-specific methods for state evaluation. An 
example of this kind of rule would be in a baseball show, where the result of a caller 
selection of a pitch is determined both by the pitch as well as the game state, team 
10 statistics, and a simulation model of baseball. A baseball state machine component is 

used by the rule to determine the outcome and eflfect the selection of the resulting 
plot point by the rule. 

The flow script defines the bubbles, bubble classes and bubble strings at the plot 
points, the interaction and dialog, and the set of transitions that may result depending on the 
15 caller's responses. It also defines the bubble strings that make up the transitions. By 
convention, the starting point of the flow script is a special plot point referred to as the 
"null" plot point. This plot point specifies the "bootstrap" transition that begins the show. 

For example, in the Soccer Telephone Show, when a plot point is reached, a 
caller is presented with a choice by the coach of his team, in response to which the caller 
20 makes a decision that aflfects the course of the game. This results in the next transition 
bubble string being output to the caller. This transition describes what happened on the field 
as a result of the caller's choice at the plot point, indicating, for example, whether the shot 
was blocked or whether possession of the ball was lost. 

The following is a sample GDML flow script for a simple Soccer Telephone 
25 Show. It shows the various plot points and transitions, as well as soccer-specific concepts 
such as offensive and defensive roles. 

<gdinl> 

<? Set Duration > 

30 <duration seconds="150"/> 

<plotPoints> 

< ! Plot Points > 

<plotPoint naine= "null Pip" special="null"> 
< rules > 

35 <rule transition="initialTr"/> 

</rules> 
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</plotPoint> 

<plotPoint name="01Plp" scoringFactor="5"> 
<granimars> 

<grarnmar role="of f ense" reference="soccer"> 
5 <norecs> 

<bubble reference="0lPlpOfNoRec3" context="of f ense"/> 
</norecs> 
</grammar> 

<grainmar role— "defense" ref erence="soccer"> 
10 <norecs> 

<bubble reference="01PlpDfNoRecl" context="def ense"/> 
<bubble reference="01PlpDfNoRec2" context="def ense"/> 
<bubble reference="01PlpDfNoRec3" context="def ense"/> 
</norecs> 
15 </grammar> 
</ grainniars> 
<bubbleStrings> 

<bubbleString role="both"> 

<bubble reference="At01" context="of f ense"/> 
20 </bubbleString> 

<bubbleString role="of f ense"> 

<bubble reference="0lPlpOf " context="of f ense"/> 
</bubbleString> 

<bubbleString role="def ense"> 
25 <bubble reference="01PlpDf " context="def ense"/> 

</bubbleString> 
</bubbleStrings> 
<rules> 

<rule of fense="left" def ense="right " transition="01to05Tr"/> 
30 <rule of fense="right" def ense="lef t " transition=" 01to06Tr "/> 

<rule of fense="right" def ense="right " transxtion="01to06TrR09"/> 
<rule offense="left" def ens e==" left" transition="01to05TrR10"/> 
<rule shot="good" transition="01ShG110"/> 
<rule shot="bad" transition="01ShBdlO"/> 
35 </rules> 

</plotPoint> 
<transitions> 

<transition name=" OltoOSTr" starting="01Plp" ending="05Plp" 
cop=" false" changes Score="false"> 
40 <bubbleString role="of f ense"> 

<bubble ref erence="VoOfLTr" context="general"/> 
</bubbleString> 
<bubbleString role="def ense"> 

<bubble reference="VoDfRlTr" context="general"/> 
45 </bubbleString> 

<bubbleString role="both"> 

<bubble re ference=" OltoOSTr" context="of f ense"/> 
</bubbleString> 
</ transition> 

50 <transition naine="01to05TrRlO" starting="01Plp" ending="10Plp" 

cop= "true " changes Score=" f al se "> 

<bubbleString role="of f ense"> 

<bubble ref erence="VoOfLTrR" context="general"/> 
</bubbleString> 
55 <bubbleString role="def ense"> 

<bubbl e re f erence="VoDf LlTrR" context= " general " / > 
</bubbleString> 
<bubbleString role="both"> 

<bubble reference="01to05TrR10" context="of f ense"/> 
60 </bubbleString> 
</transition> 
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starting="01Plp" 



<transition name="01to06Tr" 
cop=" false" changes Score=" false "> 

<bubbleString role="of f ense"> 

<bubble reference="VoOfRTr" context="general "/> 
</bubbleString> 
<bubbleString role="def ense"> 

<bubble reference="VoDfLrTr" context="general"/> 
</bubbleString> 
<bubbleString role="both"> 

<bubble reference="01to06Tr" context="of f ense"/> 
</bubbleString> 
</ trans ition> 

<transition name="01to06TrR09" 
cop="true" changesScore=" false "> 

<bubbleString role="of f ense"> 

<bubble reference="VoOf RTrR" context="general"/> 
</bubbleString> 
<bubbleString role="def ense"> 

<bubble reference="VoDfRrTrR" context="general"/> 
</bubbleString> 
<bubbleString role="both"> 

<bubble reference="01to06TrR09" context="of f ense"/> 
</bubbleString> 
</ trans ition> 



ending="06Plp" 



starting="01Plp" ending="09Plp" 



s tar t ing= "null Pip " ending= "01 Pip " 



starting="endingPlp" 



<! End Shoot Goals 

< ! Beginnings & Endings 

< ! Beginnings 
<transition naine="initialTr " 
cop=" false" changesScore="f alse"> 
<bubbleString role="both"> 

<bubble reference="Nullto01" context="def ense"/> 
</bubbleString> 
</transition> 
< ! Endings > 

< ! Non-tie Ending > 

<transition name="endingDef eat " 

ending="f inalPlp" cop="false" changesScore="f alse"> 
<bubbleString role="both"> 

<bubble reference="SportsEnd" context="general"/> 
<bubble ref erence="SportsWinner" context="winning"/> 
<bubble reference="name" context="winning"/> 
<biibble reference="%v: winnerScore%" context="general"/> 
<bubble ref erence= "name" context="losing"/> 
<bubble ref erence="to%v:loserScore%" context="general"/> 
</bubbleString> 
<bubbleString role="winning"> 

<bubble ref erence="CoachWin" 
</bubbleString> 
<bubbleString role="losing"> 

<bubble reference="CoachLost* 
</bubbleString> 
</transition> 
<! Tie Ending > 

<transition name="endingTie" starting="endingPlp* 
cop=" false" changes Score="false"> 

<bubbleString role="both"> 

<bubble reference="SportsEnd" context="general"/> 
<bubble reference="SportsTie" context="general"/> 
<bubble reference="name" context="winning"/> 
<bubble ref erence="%v: winnerScore%" context="general"/> 
<bubble ref erence= "name" context="losing"/> 



context="winning"/> 



cont ext = " 1 o s ing " / > 



ending= " final Pip " 
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<bubble reference="to%v: loserScore%" context="general"/> 
</bubbleString> 
<bubbleString role="winning"> 

<bubble ref erence="CoachTie" context="winning"/> 
5 </bubbleString> 

<bubbleString role="losing"> 

<bubble ref erence="CoachTie" cont:ext="losing"/> 
</bubbleString> 
</ transition> 

10 <! END of Beginnings & Endings > 

</transitions> 
</gdml> 



15 The GDML flow script above provides the bubbles and bubble strings required 

for the various game and caller roles. For example, if the caller's team is currently on 
defense, the defense bubble string is used. Whereas if the role specified for the bubble string 
is "both", the bubble string may be used when the caller's team is in either an Offense or 
Defense role. 

20 Reference is now made to Fig. 3, which is a simplified flowchart illustration a 

method of implementing flexible speech, operative in accordance with a preferred 
embodiment of the present invention. As was described hereinabove with reference to Fig. 
2, the flow script of the present invention provides semantic directions rather than specific 
lines of speech. To carry out flow script speech directives, flexible speech may be used 

25 where a single conversational element, such as a greeting, is expressed in a variety of ways 
and individually recorded, and then the individual recordings are provided for use by a 
virtual performer. An example of such a speech directive is in the form of a Bubble Class. 
The virtual performer then joins together multiple pre-recorded elements to form sentences. 

Thus, for example, the following pre-recorded versions may be made to express 

30 a greeting: "Hi there!", 'Tfi!", *Hello", What's up?", "How's by you?", "HowVe you 
been?", "Long time no see", "Hi there stranger", "Hi, What've you been up to?", "How are 
you?", "How are you feeling?", "You're looking good, I hope you're feeling better.", "Hi, I 
was sorry to hear that you were sick Each pre-recorded greeting variant may also be 
recorded multiple times with different types of expression. Thus, the single greeting 

35 "How've you been?" may be recorded in a happy voice, an angry voice, a sullen voice, a 
timid voice, and a suspicious voice. 

In one implementation of flexible speech, a caller history may be maintained for 
each virtual performer indicating the type of relationship the virtual performer has with each 
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caller, such as friendly or adversarial. Based on this history the virtual performer may 
decide on the appropriate type of greeting to use when he next encounters the caller, 
selecting a greeting that is appropriate for both the current script context as well as the 
current state of the relationship. 
5 In another implementation of flexible speech, a virtual performer selects a mode 

of behavior at random or in pre-association with various flow script states, and selects 
appropriate speech elements that are associated with the selected mode of behavior. 

Additional reference is now made to Fig. 4, which is a simplified pictorial 
illustration of a flexible speech association structure, constructed and operative in 

10 accordance with a preferred embodiment of the present invention. Fig. 4 shows how 
multiple speech elements of one type may be associated with multiple speech elements of 
another type, which in turn may be associated with multiple speech elements of another 
type, and so on. In Fig. 4 a speech element 400 of type "greeting" is shown having several 
pre-recorded variations. Speech element 400 is associatively followed by a speech element 

15 402 of type "request" which is also shown having several pre-recorded variations. Speech 
element 402 is in turn associatively followed by a speech element 404 of type "action" 
which is also shown having several pre-recorded variations. Finally, speech element 404 is 
followed by a speech element 406 of type "object" of which a single pre-recorded variant is 
shown. As may be clearly seen, a single sentence may be formed in many different ways by 

20 selecting one variant of each speech element. 

Reference is now made to Fig. 5, which is a simplified conceptual illustration of 
a virtual theater architecture, constructed and operative in accordance with a preferred 
embodiment of the present invention. In the virtual theater architecture of Fig. 5, a stage 
manager 500 interprets a flow script 502 of a telephone show. To carry out flow script 

25 502, stage manager 500 sends messages to one or more virtual performers 504 that are each 
assigned the task of playing the role of a specific character in the telephone show. A stage 
506 maintains show state information and acts as the venue where virtual performers 504 
exhibit their behavior. Each virtual performer 504 maintains a set of behavior rules 508 and 
a behavior history 510 which includes past behavior as well as its current state, and is 

30 capable of "watching" stage 506 to monitor show state information and of receiving 
messages from stage manager 500. Each virtual performer 504 determines its own 
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behavior, such as speech or other non-verbal actions or gestures, by applying its behavior 
rules 508 to show state information, incoming messages, and its behavior history 510. If a 
reaction to stage events and/or messages is warranted, virtual performer 504 expresses its 
behavior on stage 506 by means of one or more messages sent to stage 506 for other virtual 
5 performers 504 to "see" and "hear", to which they may in turn react. Thus, in the virtual 
theater architecture, the execution of a GDML flow script is "delegated" in part to the stage 
manager and virtual performers who determine which, how, and in what order bubbles and 
bubble classes are performed based on rules provided by the GDML script and by their own 
rule sets. This mechanism provides a balance between the scripted dramatic story line and 

10 the spontaneity of the autonomous virtual performers. 

A vocabulary 512 of words, phrases, and sentences is pre-recorded and 
maintained for each virtual performer 504, preferably in a different voice for each virtual 
performer 504. For example, the virtual performer playing the role of Homer Simpson uses 
stored audio files of Homer's voice to speak Homer's "lines", preferably using flexible 

1 5 speech as described hereinabove. 

Each virtual performer 504 may have a number of states, such as its mood and 
the nature of its relationship with each of the other virtual performers, such as aimoyance 
with one virtual performer and fiiendship with another. 

The information maintained by stage 506 preferably includes general 

20 information such as the list of active virtual performers in the current telephone show and 
the characters that they portray. Stage 506 also preferably maintains show-specific 
information such as how many times a particular virtual performer has performed a task, 
such as asking a question on a quiz show, or how many correct answers the caller has given. 
For a baseball game, stage 506 may maintain information such as the identity of the current 

25 batter, the current number of balls and strikes, etc. 

A caller may control a particular virtual performer to a pre-defined extent by 
providing input where prompted. For example, a caller may control a pitcher in a baseball 
show. The caller's command determines, for example, the type of pitch to be thrown. 
However, the pitcher may also behave in a semi-autonomous fashion, such as by generating 

30 non-verbal gestures such as "stepping off the mound", "shaking off a pitch," and "wiping his 
brow." 
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Each element of the virtual theater may be implemented using well-known 
autonomous intelligent agent architectures. The virtual theater data, rules, and messages 
may be implemented using conventional schema and ontologies. The messaging system may 
be implemented using standard message schema such as the FIPA Agent Communication 
5 Language. 

One exemplary operational scenario of the virtual theater of Fig. 5 may be 
illustrated within the context of a more detailed description of the humorous quiz show 
described hereinabove. In this scenario stage manager 500 directs the virtued Performer 
Bart Simpson, in accordance with the directives of flow script 502, to introduce the three 
10 panelists. The virtual performer playing the role of Bart has freedom of choice in carrying 
out this task, and proceeds as follows: 

L Bart must first choose which panelist he wants to introduce. 

Bart bases his decision on a list of current panelists, and the history of 
who's been called upon so far in the show (i.e., whose turn is it). This 
15 information is maintained by stage 506. 

2. Bart must then decide if he wants to use a quip. 

Bart bases his decision on his mood, current relationship with that 
character, etc. If his mood is playful, he is more likely to quip. If his 
current relationship with the character he's introducing is friendly, his 
20 quip will be positive. Both his mood and his relationship with a particular 

character are aspects of Bart's state which may change during the 
performance. 

3 . Bart then utters the first part of the introduction. 

4. Bart then utters a quip, if he so decides, and sends a quip event notice to 
25 stage 506. 

5. Bart then utters the name of the panelist, and sends a notification of the 
event (i.e., having introduced a particular panelist) to stage 506. Bart then 
notifies stage manager 500 that he has completed his task. 

An example of Bart's introduction with a quip is: "The next question will be 
30 asked by - that menace to mankind - Homer Simpson". An example of Bart's introduction 
without a quip is: "The next question will be asked by - Homer Simpson", 
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In accordance with flow script 502, stage manager 500 next decides if a retort is 
warranted to Bart's quip if made, with stage manager 500 having been notified by Bart that 
he made a quip, and with stage manager 500 having "heard" the quip by monitoring stage 
506. The virtual performer playing the part of the audience is likewise apprised of Bart's 
5 action on stage 506, to which the audience may autonomously decide to react to the quip 
with an appropriate response, such as gasping, cheering, laughter, applause, etc. The virtual 
performers playing the roles of the panelists also "hear" the positive or insulting quip and 
"see" the target of the insult or compliment. The virtual performer who is the target of 
Bart's quip may decide to change his/her state to reflect a change in mood and/or a possible 
10 change in his/her relationship with Bart. 

If stage manger 500 decides that a retort to Bart's quip is not called for, then 
stage manger 500 directs the panelist whom Bart introduced to ask the contestant a 
question. 

If stage manger 500 decides that a retort to Bart's quip is called for, stage 
15 manager 500 selects the virtual performer to respond to Bart's quip and directs him to 
respond to the quip. As before, the nature of the response is determined using criteria such 
as the virtual performer's mood, relationship vnth the target of the quip, the nature of the 
quip itself (e.g., positive or insulting), etc. 

When the virtual performer delivers the retort, stage 506 is notified of the 
20 retort, including whether it is positive or negative, to whom it is directed, etc. Stage 506 
then notifies all the virtual performers, including the audience who may react with 
appropriate cheers, boos, applause, laughter, etc., as well as stage manager 500, of the 
retort. 

Stage manager 500 then decides if more quips and retorts are required, or if 
25 Bart should be directed to call the panelists to order and a panelist directed to ask a 
question of the contestant. 

When a panelist is directed to ask a question, the panelist selects a question 
ft-om a library of questions, and then asks that question of the contestant, who is being 
played by the caller. When the caller responds through the caller's virtual performer, the 
30 panelist "hears" the caller's answer and decides, according to pre-programmed information 
available to the panelist, whether the answer is correct or not. 
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If it is determined that the answer is not correct, the panelist notifies stage 506 
that the caller has answered incorrectly. Stage 506 in turn notifies the audience, the other 
virtual performers, stage manager 500, etc. Flow script 502 then directs the flow of the 
telephone show to an appropriate bubble outline dealing with incorrect responses. 
5 If the answer is correct, the panelist announces that the caller has answered 

correctly and notifies stage 506 accordingly. Stage 506 in turn notifies the audience, the 
other virtual performers, stage manager 500, etc. Flow script 502 then directs the flow of 
the telephone show to an appropriate bubble outline dealing with correct responses. 

Play may continue in this fashion, with quips, retorts, questions, and answers, 

10 until stage manager 500 decides that the flow script's show ending criteria are met and if the 
caller has won or lost. 

Reference is now made to Fig. 6, which is a simplified block diagram illustration 
of a telephone entertainment system, constructed and operative in accordance with a 
preferred embodiment of the present invention. The system of Fig. 6 includes a telephony 

15 interface 600 interfacing with a speech/voice layer 602, which includes a speech processor 
604 capable of performing automatic speech recognition, a module 606 for facilitating input 
and output, such as a VoiceXML module, an audio playout module 608 for producing 
audio output, and a call control module for interfacing between VoiceXML module 606 and 
telephony interface 600. Speech/voice layer 602 in turn communicates with a presentation 

20 layer 610 where flow script bubbles 612 are prepared for output and where call state 
information 614 is maintained. A servlet controller 616 uses Java® Server Page (JSP) to 
populate pre-defined VoiceXML templates 618 with links to audio content 620 in order to 
carry out bubbles 612. A cache manager 622 identifies and caches flow script segments for 
rapid audio content retrieval and play out. 

25 Presentation layer 610 communicates with a game engine layer 624 where flow 

scripts 626 are processed and where software agents, representing virtual performers 628, a 
virtual theater stage 630, and a stage manager 632, operate. Game engine layer 624 uses a 
data layer 634 for storing/retrieving game variables and other show data such as real-world 
statistics, language models and behavior information, in a show data store 640, and for 

30 recording caller history in a history store 636, and for accessing user profile information 
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638. System administration is provided via a management application 642, while back-end 
application 644 may include functions such as generating reports. 

The system of Fig. 6 is preferably implemented using a scalable, multi-tier 
framework such as that specified by Java"® 2 Platform, Enterprise Edition. The engine 
5 components of system of Fig. 6 are preferably implemented using an agent-oriented 
architecture. Communication between agent components is preferably implemented using a 
scalable messaging system, such as a CORBA-compliant messaging system or the Java® 
Messaging System, and system data are preferably stored using a scalable persistence 
framework, such as that specified by the Java® 2 Platform, Enterprise Edition. The system 

10 may be implemented using application servers compliant with this framework, such as the 
Weblogic server product family, commercially available from BEA Systems, Inc. The data 
layer 634 may be implemented using a scalable relational or object-relational database, such 
as those commercially available from Oracle Corporation or Microsoft Corporation. 

Telephony interface 600 preferably includes conventional hardware and 

15 software components for interfacing with telephony networks, such as PSTN networks, 
including PRI or SS7 signaling capabilities for more robust call control, such as those 
commercially available from Intel Corporation or NMS Communications. Additionally or 
alternatively, VoIP networks with SIP or H323 may be supported. Telephony interface 600 
is preferably configured to support land and mobile telephone handsets, and VoIP devices. 

20 Although these devices primeirily send and receive voice communications, they may 
additionally support other modalities, such as SMS, WAP, MMS, that may be used for 
providing alternate caller data input and output routes for telephone shows. Telephony 
interface 600 may also be configured for use with more intelligent handsets, such as those 
configured with Java® 2 Platform, Micro Edition or Qualcomm BREW^** fiinctionality. 

25 Such handsets may Sanction as players for media protocols such as SMIL, as well as having 
some internal speech recognition capability. The capabilities of the next generation 
networks, such as UMTS and the intelligent handsets will improve both the quality of the 
audio, as well as the scalability of the overall system, as compared to a t3T)ical connection 
based telephone network. In order to support such intelligent handsets, presentation layer 

30 610 may be configured with JSP Templates to support the necessary protocols such as 
SMIL, in addition to VoiceXML. 
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Additional reference is now made to Fig. 7, which is a simplified block diagram 
illustration of selected elements of the telephone entertainment system of Fig. 6, constructed 
and operative in accordance with a preferred embodiment of the present invention. In Fig. 
7, speech/voice layer 602 (Fig. 6) is shown including a VoiceXML interpreter 700, such as 
5 VWS 1.3, commercially available from Nuance Corporation, and a CallControl and 
Telephony Middleware component 702, such as a Nuance's Rec Client, which accesses an 
audio cache 704 via an HTTP interface 706, such as an Apache HTTPD application. An 
ASR module 708, such as Nuance's Rec Server, provides speech/voice layer 602 with 
automatic speech recognition. A telephony interface 716 preferably includes standard 

10 hardware and/or software, such as is commercially available fi-om Intel Corporation, NMS 
Communications, or VocalTec, for interfacing with telephone and/or VoIP networks. 

Presentation layer 610 and game engine layer 624 are both shown installed on 
an application server 710, such as a J2EE server, while data layer 634 is shown installed on 
database server 712, such as an Oracle 9i server. Communications between presentation 

15 layer 610 and game engine layer 624 may be carried out using Java® beans 714 as Value 
Objects. Communications between the game engine layer 624 and the data layer 634 may 
be carried out using Java® beans 715 as Data Objects. 

The ASR processing of caller responses at plot points is preferably done using 
grammars. Typically, a caller will have a small set of decision choices that he may make at 

20 the plot point. Thus, a grammar may be generated that encompasses the spoken language 
variants for these choices. For example, in the case of the baseball show, the caller may 
choose a fast ball pitch using any of the following responses: "Fast", "Fast Ball", "I wanna 
Fast Ball", "Throw a um Fast Ball", Conventional ASR techniques, such as Finite Grammar 
Patterns and Stochastic Language Models (N-Gram), may be used in this regard, and 

25 multiple languages may be supported using conventional techniques. 

Presentation layer 610, as the interface layer between the game engine and the 
speech/voice layer, is responsible for all voice-user interface (VUI) flow management, and 
the generation of the dynamic VoiceXML. Presentation layer 610 may be based on the 
J2EE JSP Model 2 Architecture. In this model, servlet controller 616 provides the entry 

30 point for requests to the system. Servlet controller 616 then interfaces with Java beans 714 
for application logic, and then forwards the request to the appropriate JSP. 
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The handling of requests and the JSPs are typically defined in an XML file. This 

file defines: 

• Actions 

o Actions define the group of tasks and JSPs that the controller will use for a 
5 specified request. The action can consist of a pretask, a posttask and a JSP. The 

action definition may also include a switch statement, which allows the 
controller to select the nextAction based on a return code of the task. 

• PreTasks 

o PreTasks are Java® classes that implement fimctionality that should be 
10 performed before the controller forwards the request to the next JSP. 

• PostTasks 

o Posttasks are Java® classes that implement fimctionality that should be 
performed on a response to the current JSP. 

• JSPs 

15 o The JSPs provide templates for building the dynamic VoiceXML. 

• Events 

o Events define global actions that may be accessed across dialogs. This includes 
handling global events such as a request for help. The event mechanism is 
supported through a context stack - which allows presentation layer 610 to 
20 manage context. 

• Exceptions 

o Presentation layer 610 defines actions to handle both run time as well as 
application exceptions. 

25 The following is an example of a presentation layer definition XML: 

<fluidxml> 
<actions> 

<action name="start"> 
30 <pretask class="com.zow.dev.StartSoccer"> 

<switch> 

<branch retum="TIME_IS_UP" nextAction="end"/> 
</switch> 
</pretask> 

35 <jsp path="present.jsp"/> 

</action> 

<action name="play"> 

<pretask class="com.zow.dev.PlaySoccer"> 
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<switch> 

<braiich retum="INAPPROPRIATE" nextAction=-"inappropriate"/> 
<branch retum="TIMEJS_UP" nextAction="end7> 
</switch> 
5 </pretask> 

<jsp path— 'present jsp'V> 
</action> 

<action naine="mappropriate"> 

<jsp path="inappropriate,jsp"/> 
10 </action> 

<action naine="hangup"> 

<pretask class="com.zow.dev.Hangup"/> 

<jsp path="exit.jsp"/> 
</action> 

1 5 <action naine="end"> 

<pretask class="com,zow.dev-EndSoccer'7> 
<jsp path^"presentEnd.jsp'7> 
<yaction> 

20 </actions> 
</fluidxml> 



The interface fiinctionality between presentation layer 610 and game engine 

25 layer 624 is specified by specific preTasks. The user response at each plot point is handled 
by a presentation layer task, which then invokes game engine layer 624. Game engine layer 
624 uses GDML and game logic to compute the next transition and plot point, and returns 
all audio prompt, dialog, and grammar information needed by presentation layer 610 to 
execute the JSP using a Java® Bean. The JSP then provides a template for the dynamic 

3 0 generation of VoiceXML. 

Reference is now made to Fig. 8, which is a simplified UML collaboration 
diagram of elements of the presentation layer shown in of Figs. 6 and 7, operative in 
accordance vAth a preferred embodiment of the present invention. In the diagram of Fig. 8, 
the VoiceXML interpreter, having processed the VoiceXML specified by the JSP of the 

35 previous action, submits a request representing interpreted input fi*om the user to the servlet 
controller (801). The servlet controller retrieves all relevant information fi-om the 
presentation layer ActionsHolder and pops the session context (802) fi-om the presentation 
layer stack. The servlet controller then gets a list of the post-tasks of the previous action 
(803) and performs them (804, 805), This mechanism allows for specifying processing of 

40 the request in the previous context, such as in validating input. The new session context is 
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then pushed (806) on the presentation layer stack, and the list of pre tasks is retrieved 
(807), The controller then performs the pre tasks (808, 809). The controller forwards the 
flow to the next JSP to be presented (810, 811). As one can see, though the handling of a 
request always occurs between JSPs, the pre-task/JSP/post-task flow is formed by the fact 
5 that the controller does the post-task of the previous action before starting the next flow. 

Reference is now made to Fig. 9, which is a simplified UML sequence diagram 
of a method of operation of the system of Figs. 6 and 7, operative in accordance with a 
preferred embodiment of the present invention. The UML sequence diagram of Fig. 9 
shows the flow of a telephone show call through the system, including the flow of the call, 

10 fi^om the caller, through the telephony interface, VoiceXML and ASR to the game engine 
components. The flow describes the initiation of a call, and the subsequent audio content 
playout and voice command cycle. 

In the method of Fig. 9 the caller initiates a telephone call to a telephone show 
by dialing the telephone number associated with the telephone show. The call arrives at the 

15 voice interface, and is represented by a callConnect message flow. The voice interface then 
informs the CallControl and Telephony Middleware that a new telephone session has 
begun, which then informs the VoiceXML interpreter of the new session, as represented by 
a sessionlnit message. The VoiceXML interpreter sends a request to the presentation layer, 
as represented by a FirstRequest message. The presentation layer responds to the 

20 VoiceXML interpreter with an initial VoiceXML document that specifies the initial user 
voice interaction. The VoiceXML Interpreter processes this initial VoiceXML document 
and requests fi*om audio cache or other storage any required audio segments indicated by 
the URLs in the VoiceXML document. The VoiceXML interpreter then passes the audio 
content to the callControl, and sets up a channel for receiving speech recognition data. The 

25 audio content is then played out over the voice interface through the telephone network to 
the caller's phone. 

When the caller responds to a plot point by giving a voice command, the caller's 
speech flows from their phone over the telephony network to the voice interface and to the 
callControl' s speech channel that is listening for voice input. This voice input is then sent to 
30 the ASR server for speech recognition. The result is then sent to the VoiceXML interpreter. 
Based on the VoiceXML document, the VoiceXML interpreter then sends a new request 
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with the interpreted speech results to the presentation layer. The presentation layer's 
controller typically invokes a task which calls the game engine with the interpreted caller 
command. The game engine then determines the flow of the show, based on the caller's 
command, the engine logic, game state, and game history. The game engine then provides 
5 the presentation layer with the next plot point and transition, and all the necessary 
information for the presentation layer to construct the response VoiceXML document. The 
presentation layer controller then invokes the JSP which delivers the VoiceXML document 
to the VoiceXML interpreter. The call then continues with the playout of audio content and 
listening for further voice commands as described above. 

10 Reference is now made to Fig. 10, which is a simplified UML activity diagram 

of a method of operation of the system of Figs. 6 and 7, operative in accordance with a 
preferred embodiment of the present invention. The UML activity diagram of Fig. 10 
shows the interface and the division of responsibility between the layers of the system for a 
GDML-based game engine. The diagram shows the delivery of dynamic VoiceXML to the 

15 VoiceXML interpreter by the system in connection with the handling of a caller voice 
command at a plot point. The VoiceXML interpreter forwards the input data as an HTTP 
request to the presentation layer. The presentation layer controller then invokes the 
necessary task. This task then calls the game engine. The game engine maintains the GDML 
and the current plot point information. The game engine then takes the caller selection, the 

20 plot point state information, and the input of the NPC and constructs a choice of the next 
plot point and transition based on the GDML specification. The game engine filters the 
relevant roles and determines the binding of context. Using this information, the bubbles are 
completely specified. The game engine returns this information to the presentation layer 
task through the Value Object 714. The game task uses the content manager to bind the full 

25 path to the bubble files. The controller then invokes the JSP, which generates the response 
VoiceXML document. This VoiceXML document is then returned to the VoiceXML 
interpreter which processes the document. 

Reference is now made to Fig. 11, which is a simplified block diagram 
illustration of a method of multi-player operation of the system of Figs. 6 and 7, operative in 

30 accordance with a preferred embodiment of the present invention. In the diagram of Fig. 1 1 
two callers 1100 and 1102 are each provided with a separate logical set of telephony 
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resources, including separate telephony ports 1104 and 1106 and separate voiceXML 
sessions 1108 and 1110. Each voiceXML session is in turn handled by a separate 
presentation layer session 1112 and 1114. Both presentation layer sessions interface with a 
common instance of a game engine 1116. Appropriate data objects represent each caller's 
5 history and data. In a two-player telephone show, each player listens to the transition audio 
content. The transition content can be the same for both players or different. 

A community manager 1118 may be provided to allow callers wishing to 
participate in a multi-player game to find each other and arrange game parameters, such as 
roles, game times, etc. Community manager 1118 may be used to manage league play, such 

10 as is described hereinbelow with reference to Fig. 17. 

An example of a two-player telephone show is a two-player version of the 
soccer show described above. Using GDML context attributes, the transition descriptions 
of a soccer game can have identical sportscaster descriptions for both the caller on offense 
and the caller on defense, whereas the description of the plot point choices and game 

15 situation will be different based on the current role of each caller. Both callers are 
synchronized at the plot points, and the show continues when both callers have responded 
to their prompts at the plot point, whereupon the system delivers the next transition. 

Reference is now made to Fig. 12, which is a simplified UML activity diagram 
of a method of multi-player operation of the system of Figs. 6 and 7, operative in 

20 accordance with a preferred embodiment of the present invention. The UML activity 
diagram of Fig. 12 describes the side-by-side flow of a two-caller telephone show. In a 
two-player show, a separate telephone interface and call control, as well as a VoiceXML 
interpreter session, is provided for each call. Each caller's experience is specified by the 
respective VoiceXML documents generated by the presentation layer, which maintains 

25 separate sessions for each caller. In the method of Fig. 12, each caller separately hears the 
transition and plot point prompts, and then gives a plot point voice command. The game 
engine interprets the voice commands, decides on the next plot point and transition, and 
through the presentation layer delivers to each caller their respective next VoiceXML 
document. The game engine then synchronizes the users at plot points as needed. 

30 Reference is now made to Fig. 13, which is a simplified UML collaboration 

diagram of a game engine, constructed and operative in accordance with a preferred 
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embodiment of the present invention. The game engine of the present invention includes the 
following components: 

• Gamelnstance Manager 1308 

o The Gamelnstance Manager 1308 is the interface of the engine to the 
5 presentation layer task, and is responsible for coordinating the steps in the 

flow of a show. 

• Gamelnstance Engine 1309 

o The Gamelnstance Engine 1309 is the persistent, stateful object that is 
responsible for the logic and flow of a specific show. The Gamelnstance 
10 implementation will embody the logic and state for each kind of show. 

Binding of context is done by the Gamelnstance Engine 1309 

• TimerUtil 

o The TimerUtil is a utility which allows the Gamelnstance Manager 1308 to 

maintain the show clock 
15 • UserBean 

o The UserBean represents the identity and profile of the caller - such as 

league and preferences, and the interface to User Management. 

• UserHistory 

o The UserHistory represents the collection of the past shows the caller has 
20 played. This object maintains the history and the set of plot points traversed, 

historical engine state and results such as game score. 

• NPC 

o The NPC is the non-player-character object that is responsible for simulating 

an opponent in a two or more player game. 
25 • Content Manager 

o The Content Manager provides the necessary mappings for the GDML, and 

maps the bubbles to audio content paths 

• GDML 

o The GDML object maintains the show GDML Flow Script. 
30 • Configuration 
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o The Configuration object maintains show configuration, such as team 

selection and NPC parameters. 
The UML Collaboration diagram of Fig. 13 illustrates the interaction between 
game engine components in processing a caller voice conmiand at a plot point. In Fig. 13 
5 processing flow starts with a presentation layer task, such as "Play Soccer." The task calls 
the game engine through the Gamelnstance Manager 1308 (1301), specifying the game and 
the caller command. The Gamelnstance Manager 1308 is the mediator of the components, 
"orchestrating" the flow of control and information. The Gamelnstance Manager 1308 
validates the input (1302) and begins a transaction to enable state persistence of the 
10 components with transactional integrity (1303). The Game Instance then queries the NPC 
Manager for the NPC response at the plot point (1304). The Gamelnstance Manager 1308 
then calls the Gamelnstance Engine 1309 (1305) to process the caller command and the 
NPC response and determine the next plot point and transition. The Gamelnstance Manager 
1308 (1306) then uses the TimerUtil to determine remaining game time, and the 
15 ContentManager (1307) to complete the information needed by the presentation layer to 
invoke the JSP and deliver the VoiceXML document response. 

The game engine of the present invention may be implemented using J2EE 
design patterns as follows: 

• The Gamelnstance Manager 1308 is implemented by using a mediator pattern and a 
20 Session Bean. 

• The Gamelnstance Engine 1309 is implemented using a facade pattern, providing 
session beans for the game logic and state retrieval, and an entity bean for 
maintaining the game state. 

• The UserBean object is implemented using a fagade pattern. The session bean 
25 provides the access methods for the underlying entity beans. The user object also 

encapsulates integration with external User Management systems. 

• The UserHistory object is implemented using a facade pattern. The session bean 
provides the access methods for the underlying entity beans. The user history object 
also encapsulates integration with external User Management systems. 
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• The NPC object is implemented using a facade pattern. The logic is implemented in 
a session bean, and the persistence allows the NPC to react based on the user's 
history 

• The Content Manager is implemented as a JNDI serialized Java® Object to be 
5 available in a clustered configuration. 

• The GDML object is implemented as a JNDI serialized Java® data structure. This 
data structure is validated and uses a hash table for fast access. 

• The Configuration object is implemented as a JNDI serialized Java® data structure. 
The GameEngine preferably maintains a persistent state that includes: 

10 • Plot point traversed 

• Show-instance-specific state, such as game score, virtual performer mood, or caller 
mood 

• Show-instance-specific results and statistics. 

The show-instance-specific data is preferably described as name - value pairs using 

1 5 ontologies as described hereinabove with reference to data, rules and messages. 

The UserBeans may use connectors such as in the Java® Connector Architecture 
to provide integration with back-end systems such as user management systems, Web 
applications and SMS. 

Reference is now made to Fig. 14, which is a simplified pictorial illustration of 

20 aspects of virtual performer implementation, operative in accordance with a preferred 
embodiment of the present invention. The game engine of the present invention delegates 
roles specified in the GDML flow script to virtual performers. The roles themselves are 
specified as semantic messages in the GDML flow script. The virtual performers receive the 
semantic messages, and return BubbleString, Bubbles or BubbleClasses, and optionally 

25 update their state. In Fig. 14 the Gamelnstance Manager 1308 receives a show initiation 
request fi-om the presentation layer. The Game Instance Manager 1308 than calls a specific 
Gamelnstance which delegates the flow script roles to the virtual performers by sending 
semantic messages to them. Each virtual performer evaluates each semantic message, 
updates its state, and returns a specification for a Bubble, BubbleString, or BubbleClass to 

30 the Gamelnstance. The Gamelnstance Manager 1308 then returns presentation information 
to the presentation layer through a Java® bean. 
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In the following example of a GDML flow script, a baseball sportscaster is 
requested to respond to the action of a batter stepping away from the batting box. This 
message is defined in a baseball game ontology, and the sportscaster virtual performer 

maintains a rule to respond to this event. 

5 < ! Beginnings > 

<transition name="initialTr" starting="nullPlp" ending="01Plp" 
cop=" false" changesScore=" false "> 
<bubbleString role="both"> 

<vpMessage name=sports caster > 

10 

< message> coinmunicative-act= request: 
< sender name= BOTN-Engine/> 
<receiver name = sportscast:er/> 
<cont:ent> 

15 Baseball: : BatterLeavesBox 

</content> 
<ontology name= Baseball/> 
</message> 

20 </vpMessage> 

<bubble reference="Nullto01" context="def ense"/> 
</bubbleString> 
</transition> 

< ! Endings > 

25 

For example, the response of the virtual performer sportscaster in this case could be 

<bubble reference="BatterLeavesBox" context="general"/> 
This response defines at the GDML level the BubbleClass that the Sportscaster virtual 
performer chooses to deliver. 

30 Reference is now made to Fig. 15, which is a simplified pictorial illustration of 

aspects of virtual performer implementation, operative in accordance with a preferred 
embodiment of the present invention. In Fig. 1 5 A virtual performer is shown basing his 
behavior on his rule set, his local history, and the game state information maintained by the 
Gamelnstance. The rules are typically specified as a response to a message, which specifies 

35 a list of actions. These actions can be: 

• Send Bubble/BubbleClass/BubbleString 

• Invoke internal Java® method 

• Update a state variable in the virtual performer. 

The virtual performer state is preferably described as a name-value pair using pre-defined 
40 ontologies. These Ontologies preferably include the domain ontology of the show, as well 
as general ontologies that describe mood and emotion, such as the HumanML ontology. 
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The author of the telephone show preferably specifies these rules as declarative 
rules when defining the character of the virtual performer. These rules may be specified 
using a standard syntax such as BRML or RuleML, The virtual performer implementation 
includes the mechanism to trigger these rules and correctly evaluate them using an interface 
5 such as that specified by the Java® Rule Engine API in JSP 94. 

Show history and state information is preferably stored and retrieved as name- 
value psdrs or tuples according to a pre-defined ontology. Virtual performers preferably use 
known agent learning techniques, which pro\dde for adaptation to caller behavior. For 
example, a virtual performer baseball batter would infer that a caller consistently chooses to 

10 throw a sequence of fast ball, curve ball, and slider. The batter would then infer a new rule 
to anticipate a slider following a fast ball and a curve ball. 

Virtual performers are preferably implemented as distributed objects on the 
fi*amework described above. Typically, in a J2EE implementation, a virtual performer will 
consist of a session bean to encapsulate the character fianctionaUty, an entity bean to 

15 encapsulate persistence of state, and a message-driven bean to allow the virtual performer 
to react and change state based on events. Requests to a virtual performer are preferably 
implemented using an XML binding of FIPA-style interaction protocol messages. 

Reference is now made to Fig, 16, which is a simplified pictorial illustration of 
aspects of virtual theater implementation, operative in accordance with a preferred 

20 embodiment of the present invention. In Fig. 16 a stage manager manages the narrative 
flow of a flow script and delegates the roles virtual performers. The stage manager controls 
the flow of the GDML flow script, and requests fi*om the virtual performers to respond to 
the flow script messages. The virtual performers use their history and rules to determine 
their response, and can consult the stage. The stage maintains a set of global name-value 

25 pairs which describe the state of the show, the actions other virtual performers are 
performing, and the responses of the caller. The caller, through the virtual performer he 
controls, makes plot point decisions that the stage manager uses to interpret the direction of 
flow script execution. 

Reference is now made to Fig. 17, which is a simplified flowchart illustration of 

30 community — based telephonic entertainment, operative in accordance with a preferred 
embodiment of the present invention, A telephone show community may be created by 
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recording the history of each caller's performance with respect to a telephone show and 
then making this history, and optionally the caller*s profile information, available to other 
callers. This serves to establish caller reputation and identity that is reinforced in the 
telephone show, particularly in multi-player shows. The caller's information is preferably 
5 made available based on permissions that the caller provides during their participation in a 
show. In the context of a competitive telephone show, such as a sports show, this may be 
extended to maintaining leagues in which callers compete with each other. The callers may 
then be ranked based on their performance in the show. For example, in the case of a 
baseball show, callers may be scored based on their role as a pitcher. Statistics such as 

10 earned run average may then be used for ranking callers. Tournament play may be provided 
where the selection of the opposing team and the level of difficulty may be determined by 
the caller's success in the tournament. Callers may advance ft-om round - robin play, to 
quarterfinals, semi-finals and grand finals. In the context of an adventure show, callers may 
be made aware of another caller's choice of role and ability in the show. This caller 

15 information may be made available to the other callers in the context of the show, and/or 
through supplementary media such as on a community web site, or via WAP or SMS 
messaging. The method of Fig. 17 may be implemented within the context of back-end 
application 644 (Fig, 6). 

It is appreciated that one or more of the steps of any of the methods described 

20 herein may be omitted or carried out in a different order than that shown, without departing 
fi-om the true spirit and scope of the invention. 

While the methods and apparatus disclosed herein may or may not have been 
described with reference to specific computer hardware or software, it is appreciated that 
the methods and apparatus described herein may be readily implemented in computer 

25 hardware or software using conventional techniques. 

While the present invention has been described with reference to one or more 
specific embodiments, the description is intended to be illustrative of the invention as a 
whole and is not to be construed as limiting the invention to the embodiments shown. It is 
appreciated that various modifications may occur to those skilled in the art that, while not 

30 specifically shown herein, are nevertheless v^thin the true spirit and scope of the invention. 



wo 03/019917 



PCT/nL02/00712 



39 
CLAIMS 

What is claimed is: 

L A method for operating a telephone entertainment program, the method 

comprising: 

S a) receiving a voice communication from at least one caller; 

b) selecting audio output in accordance Avith an audio entertainment 

program; 

c) presenting said audio output to said caller; 

d) prompting said caller for input at a plot point of said audio entertainment 

10 program; 

e) receiving said input from said caller; 

f) selecting audio output at least partly in accordance with said audio 
entertainment program and said input; and 

g) presenting to said caller said audio output selected in step f). 

15 

2. A method according to claim 1 and further comprising performing steps d) 

through g) a plurality of times for a plurality of plot points of said audio entertainment 
program. 

20 3. A method according to clmm 1 wherein said selecting step f) comprises: 

applying decision logic to said input, thereby determining a state of said audio 

entertainment program; and 

selecting said audio output at least in part according to a predetermined 

association with said state. 



25 



4. A method according to claim 1 wherein any of said receiving steps comprises 

receiving audio input. 



5. A method according to claim 1 wherein any of said receiving steps comprises 

30 receiving text-based input. 
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6. A method according to claim 1 wherein any of said selecting and presenting 
steps comprises selecting and presenting text-based input. 

7. A method according to claim 1 and fiirther comprising maintaining a history of 
5 said caller inputs, and wherein said selecting step f) comprises selecting at least in part in 

accordance with said history. 

8. A method according to clmm 1 and further comprising operating a plurality of 
virtual performers, and wherein any of said selecting steps comprises any of said virtual 

10 performers determining a state of said audio entertainment program and selecting at least 
part of said audio output according to a predetermined association with said state. 

9. A method according to claim 1 and further comprising operating a game 
simulation engine operative to: 

15 apply decision logic to said input, thereby determining a state of said audio 

entertainment program; and 

select said audio output at least in part according to a predetermined association 
with said state. 

20 10. A method according to claim 9 wherein said operating step comprises applying 

said decision logic in accordance with a rule structure of a game. 

11. A method according to claim 9 wherein said operating step comprises applying 
s^d decision lo^c in accordance with a predetermined outcome probability. 

25 

12. A method according to claim 1 and further comprising: 

conducting said audio entertainment program for each of a plurality of callers; 
recording a history of the interaction of each of said callers with said audio 
entertainment program, and 
30 providing access to said histories to any of said callers. 
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13. A method according to claim 12 and further comprising ranking said callers 

according to a characteristic of said caller's interaction with said audio entertainment 
program. 

5 14. A method according to claim 1 wherein any of said steps are performed for a 

plurality of callers within the context of said audio entertainment program. 

15. A method of constructing phrases from pre-recorded variants of speech 
elements, the method comprising: 

10 a) selecting a pre-recorded variant of a first speech element from a group of 

pre-recorded variants of said first speech element; 

b) selecting a pre-recorded variant of a second speech element from a group 
of pre-recorded variants of said second speech element; and 

c) constructing a phrase from said selected variants. 

15 

16. A method according to claim 15 wherein said selecting step b) comprises 
selecting where said second speech element associatively follows said first speech element. 

17. A method according to claim 15 wherein any of said selecting steps comprises 
20 selecting any of said variants at least in part according to a predetermined association with a 

relationship between a virtual performer and a caller. 

18. A virtual theater architecture comprising: 

virtual performer means operative to play the role of a specific character in a 
25 telephone show; 

stage manager means operative to interpret a flow script of said telephone show 
and send messages to said virtual performer means, each of said messages being a directive 
of said flow script; and 

stage means operative to maintain state information of said telephone show and 
30 receive behavior exhibited by said virtual performer means responsive to receipt of any of 
said messages. 
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19. A virtual theater architecture according to claim 18 and further comprising: 
a set of behavior rules; and 

a behavior history, 

5 and wherein said virtual performer means is operative to determines its own 

behavior by applying said behavior rules to any of said state information, said incoming 
messages, and said behavior history. 

20. A telephone entertainment system comprising: 

10 a telephony interface operative to interface with a caller; 

speech/voice processing means operative to interface with said telephony 
interface receive input from said caller; 

presentation means operative to interface with said speech/voice processing 
means and prepare output at least partly based on said input; and 
15 a game engine operative to interface with said presentation means and operate 

at least one virtual performer in accordance with a flow script, thereby providing an output 
directive to said presentation means for use in preparing said output. 

21 . A telephone entertainment system comprising: 

20 a telephony interface operative to interface with a caller; 

speech/voice processing means operative to interface with said telephony 
interface and including: 

a speech processor operative to perform automatic speech recognition on 
speech input received from said caller; 
25 a template module for facilitating input and output via templates; 

an audio playout module for producing audio output to ssud caller; 
presentation means operative to interface with said speech/voice processing 
means and including: 

means for preparing flow script bubbles for output via said audio playout 

30 module; 

means for maintaining call state information; 
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means for populating pre-defined templates with links to audio content in 
predetermined association with said bubbles and said call state; 

a game engine operative to interface with said presentation means and including: 
means for processing a flow script; 
5 means for operating software agents representing virtual performers in 

accordance with said flow script; and 

data storage means accessible to smd game engine for storing and retrieving any 
of game variables, user profile information, statistics, language models and behavior 
information in association with the processing of said flow script. 

10 

22. A method of processing user input into an interactive telephony application 
architecture, the method comprising: 

submitting a request to a controller, said request representing interpreted input 

from a user; 
15 said controller: 

retrieving information from a presentation layer relevant to smd input and 
popping a session context from a presentation layer stack; 

retrieving a list of post-tasks of a previous action and performing said 

post-tasks; 

20 pushing a new session context onto said presentation layer stack; 

retrieving a list of pre-tasks; 
performing said pre-tasks; and 

rendering output via a scripted template subsequent to performing any of 

said tasks. 

25 

23. A flow script for an audio entertainment program, the flow script comprising: 
a plurality of plot points; 

a plurality of transitions between said plot points; 

a plurality of rules for determining movement between said plot points and said 
30 transitions; and 
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a plurality of output directives associated with any of said plot points and said 

transitions. 

24. A flow script according to claim 23 and further comprising a plurality of 
5 messages for delegating to a plurality of virtual performers responsibility for determining 

actual output based on any of smd output directives. 

25. A flow script according to claim 23 and further comprising a plurality of rules 
for determining movement between said plot points and smd transitions based on caller 

10 input. 

26. A flow script according to claim 25 and further comprising a grammar for 
interpreting said caller input. 

1 5 27. A telephone entertainment system comprising: 

a) means for receiving a voice communication from at least one caller; 

b) means for selecting audio output in accordance with an audio 
entertainment program; 

c) means for presenting said audio output to said caller; 

20 d) means for prompting said caller for input at a plot point of said audio 

entertainment program; 

e) means for receiving said input from said caller; 

f) means for selecting audio output at least partly in accordance with said 
audio entertainment program and said input; and 



25 g) means for presenting to said caller said audio output selected in step f). 

28. A system according to claim 27 wherein said means for selecting f) comprises: 

means for applying decision logic to said input, thereby determining a state of 
said audio entertainment program; and 
30 means for selecting said audio output at least in part according to a 



predetermined association with said state. 
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29. A system according to claim 27 wherein any of said means for receiving are 

operative to receive audio input. 

5 30. A system according to claim 27 wherein any of said means for receiving are 

operative to receive text-based input. 

31. A system according to claim 27 wherein any of said means for selecting and 
presenting are operative to select and present text-based input. 

10 

32, A system according to claim 27 and further comprising means for maintaining a 
history of said caller inputs, and wherein said means for selecting f) is operative to select at 
least in part in accordance with said history. 

15 33. A system according to claim 27 and further comprising a plurality of virtual 

performers operative to determine a state of said audio entertainment program and select at 
least part of said audio output according to a predetermined association with said state. 

34. A system according to claim 27 and fiirther comprising a game simulation 
20 engine operative to: 

apply decision logic to said input, thereby determining a state of said audio 
entertainment program; and 

select said audio output at least in part according to a predetermined association 
with said state. 

25 

35. A system according to claim 34 wherein said game engine is operative to apply 
said decision logic in accordance Avith a rule structure of a game. 

36. A system according to claim 34 wherein said game engine is operative to apply 
30 said decision logic in accordance with a predetermined outcome probability. 
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37. A system according to claim 27 and further comprising: 

means for conducting said audio entertainment program for each of a plurality 

of callers; 

means for recording a history of the interaction of each of said callers with said 
5 audio entertainment program; and 

means for providing access to said histories to any of said callers. 

38. A system according to claim 37 and fiirther comprising means for ranking said 
callers according to a characteristic of said caller's interaction with said audio entertainment 

10 program. 

39. A system according to claim 27 wherein any of said means are operative for a 
plurality of callers within the context of said audio entertainment program. 

15 40. A phrase construction architecture comprising: 

a first group of pre-recorded variants of speech elements; and 
a second group of pre-recorded variants of speech elements, wherein said 
second group associatively follows said first group. 

20 41. A virtual theater method comprising: 

operating at least one virtual performer operative to play the role of a specific 
character in a telephone show; 

interpreting a flow script of said telephone show and send messages to said 
virtual performers, each of said messages being a directive of said flow script; and 
25 maintaining state information of said telephone show responsive to behavior 

exhibited by said virtual performers responsive to receipt of any of said messages. 

42. A method according to claim 41 wherein said operating step comprises applying 

behavior rules of said virtual performer to any of said state information, said incoming 
30 messages, and a behavior history for said virtual performer. 
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